Skip to content

bugfix: check sslhandshake result on immediate OK#532

Closed
tzssangglass wants to merge 1 commit into
openresty:masterfrom
tzssangglass:fix-sslhandshake-immediate-result
Closed

bugfix: check sslhandshake result on immediate OK#532
tzssangglass wants to merge 1 commit into
openresty:masterfrom
tzssangglass:fix-sslhandshake-immediate-result

Conversation

@tzssangglass
Copy link
Copy Markdown
Contributor

@tzssangglass tzssangglass commented Jun 2, 2026

This patch fixes the immediate FFI_OK path in tcpsock:sslhandshake().

The bug

Currently sslhandshake() treats the initial FFI_OK return from
ngx_http_lua_ffi_socket_tcp_sslhandshake() as the final successful result when
reused_session == false:

rc = ngx_http_lua_ffi_socket_tcp_sslhandshake(...)

if rc == FFI_OK then
    if reused_session == false then
        return true
    end

    rc = ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
end

This skips ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result() for the
reused_session == false case.

Why this is wrong

The first FFI call result and the final SSL handshake result are not the same
thing.

ngx_http_lua_ffi_socket_tcp_sslhandshake() internally calls
ngx_ssl_handshake(c), which returns NGX_OK when the TLS protocol handshake
completes successfully. This means the peer presented a valid certificate that
completed the TLS key exchange. The certificate was correctly formatted, signed
by its issuer, and the TLS record layer handshake finished.

However, after ngx_ssl_handshake(c) returns, the ngx_lua handler calls
SSL_get_verify_result() to check whether the certificate chain is trusted
according to the configured CA store. If the peer's CA is not in the trusted
store, the handler records the verify error:

u->error_ret = "unable to verify the first certificate"
u->openssl_error_code_ret = 21

But the outer C code only checks the original rc from ngx_ssl_handshake():

rc = ngx_ssl_handshake(c);

ngx_http_lua_ssl_handshake_handler(c);

if (rc == NGX_ERROR) {
    // rc is NGX_OK, so this is skipped
    return NGX_ERROR;
}

return NGX_OK;  // FFI_OK to Lua — but u->error_ret is set!

The C FFI returns FFI_OK even though certificate verification recorded a
failure. In Lua, the immediate FFI_OK path returns true directly, so the
caller believes TLS succeeded. The next socket operation then fails with a
misleading error (e.g. "closed" on write) instead of reporting the original
certificate verification failure.

Visual comparison

Old flow:

Lua sslhandshake()                  C
      |
      v
FFI: sslhandshake() ──────────► ngx_ssl_handshake(c) → NGX_OK
                                     |
                                     v
                                handler: SSL_get_verify_result() → 21
                                     |
                                     v
                                u->error_ret = "unable to verify..."
                                     |
                                     v
                                if (rc == NGX_ERROR) ... no
                                     |
                                     v
      ◄──────────────────────── return NGX_OK / FFI_OK
      |
      v
reused_session == false?
      |
      v
return true           ◄── final result never checked, error lost
      |
      v
caller believes TLS ok → sock:send() → "closed"

Fixed flow:

Lua sslhandshake()                  C
      |
      v
FFI: sslhandshake() ──────────► same as above, returns FFI_OK
      |
      v
FFI: get_result() ────────────► u->error_ret != NULL → NGX_ERROR
      |
      v
return nil, "21: unable to verify the first certificate"

The patch

rc = ngx_http_lua_ffi_socket_tcp_sslhandshake(...)

if rc == FFI_ERROR then
    return nil, err
end

if rc == FFI_DONE then
    return reused_session
end

local res

if rc == FFI_OK then
    res = ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
else
    assert(rc == FFI_AGAIN)
    co_yield()
    res = ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
end

if res == FFI_ERROR then
    return nil, ssl_err
end

assert(res == FFI_OK)

return true or session

Two changes:

  1. The immediate FFI_OK path now always reads the final result via
    ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result().
  2. Uses separate variables (rc for the first FFI call state, res for the
    final handshake result) so the two meanings do not overlap.

This keeps the existing async behavior unchanged, but makes the immediate
FFI_OK path follow the same final-result check as the resumed FFI_AGAIN
path.

#500 has not been completely resolved the #499

I hereby granted the copyright of the changes in this pull request
to the authors of this lua-resty-core project.

This patch fixes the immediate `FFI_OK` path in `tcpsock:sslhandshake()`.

Currently `sslhandshake()` treats the initial `FFI_OK` return from
`ngx_http_lua_ffi_socket_tcp_sslhandshake()` as the final successful result when
`reused_session == false`:

```lua
rc = ngx_http_lua_ffi_socket_tcp_sslhandshake(...)

if rc == FFI_OK then
    if reused_session == false then
        return true
    end

    rc = ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
end
```

This skips `ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result()` for the
`reused_session == false` case.

That is unsafe because the first FFI call result and the final SSL handshake
result are not the same thing.

The first return value tells Lua how the handshake operation progressed:

```text
FFI_ERROR  -> immediate call failed
FFI_DONE   -> reused session
FFI_OK     -> handshake operation completed immediately
FFI_AGAIN  -> handshake needs to yield and resume later
```

The final result must still be read from:

```text
ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
```

This matters when the C side completes the handshake path immediately but records
an SSL error in the cosocket state. For example, certificate verification can fail
inside the handshake handler and set:

```text
u->error_ret
u->openssl_error_code_ret
```

The old Lua code can miss that state.

Old flow:

```text
Lua sslhandshake()
      |
      v
C: ngx_http_lua_ffi_socket_tcp_sslhandshake(...)
      |
      v
returns FFI_OK
      |
      v
reused_session == false?
      |
      v
return true
      |
      v
final handshake result is never checked
```

So callers may believe TLS succeeded even though the C cosocket state already has
a handshake error. The next socket operation can then fail with a misleading error,
for example a write failure, instead of returning the original SSL handshake error.

The fixed flow always reads the final result for both immediate and async
completion paths:

```text
Lua sslhandshake()
      |
      v
C: ngx_http_lua_ffi_socket_tcp_sslhandshake(...)
      |
      +-----------------------------+
      |                             |
      v                             v
   FFI_OK                       FFI_AGAIN
immediate                    async handshake
completion                       |
      |                          v
      |                       co_yield()
      |                          |
      +------------+-------------+
                   |
                   v
        get_sslhandshake_result()
                   |
                   v
              final result
                   |
          +--------+--------+
          |                 |
          v                 v
      FFI_ERROR           FFI_OK
          |                 |
          v                 v
   return SSL error   return true/session
```

This patch separates the two meanings by using:

```text
rc   -> result of ngx_http_lua_ffi_socket_tcp_sslhandshake()
res  -> result of ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result()
```

The new control flow is:

```lua
rc = ngx_http_lua_ffi_socket_tcp_sslhandshake(...)

if rc == FFI_ERROR then
    return nil, err
end

if rc == FFI_DONE then
    return reused_session
end

local res

if rc == FFI_OK then
    res = ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
else
    assert(rc == FFI_AGAIN)
    co_yield()
    res = ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(...)
end

if res == FFI_ERROR then
    return nil, ssl_err
end

assert(res == FFI_OK)

return true or session
```

This keeps the existing async behavior, but makes the immediate `FFI_OK` path
follow the same final-result check as the resumed `FFI_AGAIN` path.
Copilot AI review requested due to automatic review settings June 2, 2026 04:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors the sslhandshake result handling logic in the cosocket SSL handshake path, replacing a loop with a more linear flow.

Changes:

  • Simplified return paths for FFI_ERROR and FFI_DONE.
  • Replaced the while true polling loop with a single-yield flow and a res variable.
  • Centralized handling of ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result return code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/resty/core/socket.lua
Comment on lines +439 to +460
local res

rc = C.ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(r, u,
session_ptr, errmsg, openssl_error_code)
if rc == FFI_OK then
res = C.ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(r, u,
session_ptr, errmsg, openssl_error_code)
else
assert(rc == FFI_AGAIN)

assert(rc == FFI_OK)
co_yield()

if session_ptr[0] == nil then
return session_ptr[0]
end
res = C.ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(r, u,
session_ptr, errmsg, openssl_error_code)
end

return ffi_gc(session_ptr[0], C.ngx_http_lua_ffi_ssl_free_session)
if res == FFI_ERROR then
if openssl_error_code[0] ~= 0 then
return nil, openssl_error_code[0] .. ": " .. ffi_str(errmsg[0])
end
return nil, ffi_str(errmsg[0])
end

assert(rc == FFI_AGAIN)
assert(res == FFI_OK)
Comment thread lib/resty/core/socket.lua
Comment on lines +441 to +444
if rc == FFI_OK then
res = C.ngx_http_lua_ffi_socket_tcp_get_sslhandshake_result(r, u,
session_ptr, errmsg, openssl_error_code)
else
@tzssangglass
Copy link
Copy Markdown
Contributor Author

Another fix way: openresty/lua-nginx-module#2506

@tzssangglass
Copy link
Copy Markdown
Contributor Author

Reproduction

Because the bug requires the handshake to complete immediately in the kernel, we modify the C module to temporarily force the cosocket fd into blocking mode around SSL_do_handshake(). An external openssl s_server is used as the TLS peer (an in-process nginx server would deadlock on a blocking cosocket fd).

C module patch for reproduction (lua-nginx-module)

Apply this diff on v0.10.31rc2 (the Kong bundle version — it still uses the old if (rc == NGX_ERROR) check):

diff --git a/src/ngx_http_lua_socket_tcp.c b/src/ngx_http_lua_socket_tcp.c
--- a/src/ngx_http_lua_socket_tcp.c
+++ b/src/ngx_http_lua_socket_tcp.c
@@ -8,7 +8,7 @@
 #define DDEBUG 0
 #endif
 #include "ddebug.h"
-
+#include <fcntl.h>
 
 #include "ngx_http_lua_socket_tcp.h"
 #include "ngx_http_lua_input_filters.h"
@@ -2002,7 +2005,20 @@
 #endif
 
-    rc = ngx_ssl_handshake(c);
+    {
+        int sfd = c->fd;
+        int oflags = fcntl(sfd, F_GETFL, 0);
+
+        fcntl(sfd, F_SETFL, oflags & ~O_NONBLOCK);
+
+        rc = ngx_ssl_handshake(c);
+
+        fcntl(sfd, F_SETFL, oflags);
+    }
+
+    ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
+                  "sslhandshake forced-blocking rc=%d handshaked=%d",
+                  rc, c->ssl ? c->ssl->handshaked : -1);

What this does: fcntl(fd, F_SETFL, oflags & ~O_NONBLOCK) temporarily strips O_NONBLOCK from the cosocket fd. SSL_do_handshake() then blocks in the kernel until TLS negotiation finishes, returning 1. nginx's ngx_ssl_handshake() sets c->ssl->handshaked = 1 and returns NGX_OK. This triggers the immediate FFI_OK path in Lua.

External TLS server scripts

The TLS peer must be a separate process. An in-process nginx server would deadlock because the worker thread is blocked inside SSL_do_handshake() and cannot accept() new connections.

scripts/start-ssl-server.sh:

#!/bin/bash
set -e

PORT=${1:-12345}
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
CERT_DIR="$SCRIPT_DIR/../t/cert"
PID_FILE="$SCRIPT_DIR/../.openssl-s-server.pid"

if [ -f "$PID_FILE" ]; then
    OLD_PID=$(cat "$PID_FILE")
    if kill -0 "$OLD_PID" 2>/dev/null; then
        echo "openssl s_server already running (pid=$OLD_PID)"
        exit 0
    fi
fi

echo "Starting openssl s_server on 127.0.0.1:$PORT..."
openssl s_server \
    -cert "$CERT_DIR/mtls_server.crt" \
    -key "$CERT_DIR/mtls_server.key" \
    -accept "127.0.0.1:$PORT" \
    -quiet \
    -no_ssl3 \
    -no_tls1 \
    -no_tls1_1 \
    &
PID=$!
echo "$PID" > "$PID_FILE"
echo "openssl s_server started (pid=$PID)"

for i in $(seq 1 30); do
    if kill -0 "$PID" 2>/dev/null; then
        echo "Server ready."
        exit 0
    fi
    sleep 0.1
done
echo "ERROR: openssl s_server failed to start"
exit 1

scripts/stop-ssl-server.sh:

#!/bin/bash
set -e

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PID_FILE="$SCRIPT_DIR/../.openssl-s-server.pid"

if [ -f "$PID_FILE" ]; then
    PID=$(cat "$PID_FILE")
    if kill -0 "$PID" 2>/dev/null; then
        echo "Stopping openssl s_server (pid=$PID)..."
        kill "$PID"
        rm -f "$PID_FILE"
        echo "Stopped."
    else
        echo "openssl s_server (pid=$PID) not running, removing stale pid file."
        rm -f "$PID_FILE"
    fi
else
    echo "No pid file found."
fi

Certificate setup: The server presents mtls_server.crt (signed by mtls_ca.crt). The test client trusts test.crt (an unrelated self-signed CA that did not issue the server cert). With ssl_verify=true, certificate verification always fails, producing error code 21.

Test case (t/999-immediate-sslhandshake-ok.t)

use Test::Nginx::Socket::Lua;

repeat_each(2);
plan tests => repeat_each() * 5;

log_level 'debug';
no_long_string();

sub read_file {
    my $infile = shift;
    open my $in, $infile or die "cannot open $infile: $!";
    my $cert = do { local $/; <$in> };
    close $in;
    $cert;
}

our $WrongCA = read_file("t/cert/test.crt");
our $ServerPort = $ENV{TEST_NGINX_SSL_SERVER_PORT} || 12345;
our $HtmlDir = html_dir();

run_tests();

__DATA__

=== TEST 1: verify error surfaced on immediate sslhandshake
--- config eval
"
    lua_ssl_trusted_certificate $::HtmlDir/wrong-ca.crt;
    lua_ssl_verify_depth 4;

    location /t {
        content_by_lua_block {
            local sock = ngx.socket.tcp()
            local ok, err = sock:connect('127.0.0.1', $::ServerPort)
            if not ok then
                ngx.say('connect failed: ', err)
                return
            end

            local sess, err = sock:sslhandshake(false, 'example.com', true)
            if not sess then
                ngx.say('sslhandshake failed: ', err)
                return
            end

            ngx.say('sslhandshake ok: ', type(sess))

            local bytes, err = sock:send('GET / HTTP/1.0\\r\\nHost: example.com\\r\\n\\r\\n')
            if not bytes then
                ngx.say('send failed: ', err)
                return
            end

            ngx.say('send ok: ', bytes)
            sock:close()
        }
    }
"
--- user_files eval
">>> wrong-ca.crt
$::WrongCA
"
--- request
GET /t
--- response_body
sslhandshake failed: 21: unable to verify the first certificate
--- error_log eval
["sslhandshake forced-blocking rc=0 handshaked=1"]
--- no_error_log
[alert]
[emerg]

Running the test

  1. Apply the diff above to lua-nginx-module, then build nginx with this patched module.
  2. Start the external TLS server.
  3. Run the test with the old lua-resty-core → observe sslhandshake ok + send failed: closed.
  4. Switch to the fixed lua-resty-core → observe sslhandshake failed: 21: unable to verify....
cd lua-nginx-module

# Start external TLS server
bash scripts/start-ssl-server.sh

# Run test
eval "$(perl -I $HOME/perl5/lib/perl5 -Mlocal::lib)"
export PATH=$PWD/work/nginx/sbin:$PWD/work/bin:$PATH
prove -I../test-nginx/lib t/999-immediate-sslhandshake-ok.t

# Stop
bash scripts/stop-ssl-server.sh

@tzssangglass
Copy link
Copy Markdown
Contributor Author

Hi @zhuizhuhaomeng PTAL

@tzssangglass
Copy link
Copy Markdown
Contributor Author

Closed because the maintainer chose another fix way: openresty/lua-nginx-module#2506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants